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CN . Abstract 



This paper is concerned with asymptotic theory for penalized spline estimator in bivariate 
additive model. The focus of this paper is put upon the penalized spline estimator obtained 
by the backfitting algorithm. The convergence of the algorithm as well as the uniqueness 
of its solution are shown. The asymptotic bias and variance of penalized spline estimator 
are derived by an efficient use of the asymptotic results for the penalized spline estimator in 
marginal univariate model. Asymptotic normality of estimator is also developed, by which 
an approximate confidence interval can be obtained. Some numerical experiments confirming 
■ theoretical results are provided. 
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1 Introduction 

^ \ The additive model is a typical regression model with multidimensional covariates and is usually 

c5 . expressed as 

y% = h{xn) H h /d(^d) + 

for given data {(yi, xn, ■ ■ ■ , xhj) : i = 1, ••• , n}, where each fd(d = 1, • • • , D) is a univariate 
function with a certain degree of smoothness. This paper focuses on the bivariate additive 
model, in which D = 2. 

The additive model has become a popular smoothing technique and its fundamental prop- 
erties have been summarized in literature such as Buja et al. (1989) and Hastie and Tibshirani 
(1990). Buja et al. (1989) proposed the so-called backfitting algorithm, which is efficient for 
nonparametric estimation of fd(d = 1, • • • , D). The backfitting algorithm is a repetition update 
algorithm and its convergence and the uniqueness of its solution are not always assured. Buja 
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et al. (1989) showed the sufficient condition for convergence of the backfitting algorithm and 
the uniqueness of its solution for the bivariate additive model. 

In this paper, we discuss the asymptotic properties of the penalized spline estimator for the 
additive model with D = 2. Unlike spline smoothing, the asymptotic results of kernel smoothing 
for the additive model have been obtained. Ruppert and Opsomer (1997) showed that a certain 
kernel smoothing for the additive model satisfies the sufficient condition for convergence of 
the backfitting algorithm and the uniqueness of its solution. Furthermore, they derived the 
asymptotic bias and variance of the kernel estimator for the bivariate additive model. 

Opsomer (2000) presented the sufficient condition for convergence of the backfitting algo- 
rithm and the uniqueness of its solution for the D-variate additive model in Lemma 2.1. The 
asymptotic bias and variance of the kernel estimator for the D-variate additive model were also 
derived under the assumption that the sufficient condition for convergence of the backfitting 
algorithm holds. Wand (1999) investigated asymptotic normality of the kernel estimator for 
the D-variate additive model by elegant use of the results in Opsomer (2000). We observe 
from Wand's results of asymptotic normality that kernel estimators of f^s are asymptotically 
independent. 

Many researchers have explored the effectiveness of spline smoothing, such as Wahba (1975) 
and Green and Silverman (1994). Penalized spline estimators have been discussed in O'Sullivan 
(1986), Eilers and Marx (1996), Marx and Eilers (1998) and Ruppert et al. (2003). Despite its 
richness of application, asymptotics for spline smoothing seems have not yet been sufficiently 
developed. 

For the univariate model (D = 1), Agarwal and Studen (1980) and Zhou et al. (1998) 
obtained important asymptotic results for the regression spline. Hall and Opsomer (2005) gave 
the mean squared error and consistency of the penalized spline estimator. The asymptotic 
bias and variance of the penalized spline estimator were obtained in Claeskens et al. (2009). 
Kauermann et al. (2009) worked with the generalized linear model. Wang et al. (2011) showed 
that the penalized spline estimator is asymptotically equivalent to a Nadaraya- Watson estimator. 
Thus, it seems that developments of asymptotic theories of the penalized spline are relatively 
recent events and we note that those works are mainly regarding the univariate model (D = 1). 
In the case of multidimensional covariates, Stone (1985) showed the consistency of the regression 
spline in the D-variate additive model, but it is not penalized spline. 

The aim of this paper is to derive asymptotic bias, asymptotic variance, and asymptotic 
distribution of the penalized spline estimator in the bivariate additive model. The penalized 
spline estimator for the bivariate additive model is obtained using the penalized least squares 
method and the backfitting algorithm. The uniqueness of the solution of the backfitting algo- 
rithm cannot be proved in general, but its convergence property can be shown. However, it is 
demonstrated that the solution of the backfitting algorithm is asymptotically unique and the 
objective function for the penalized least squares method is shown to be asymptotically convex. 
As will be seen in the subsequent section, the penalized spline estimator in a bivariate setting 
has a closed form, which we can use for asymptotic manipulations. The properties of band 
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matrices play an important role as a mathematical tool in asymptotic considerations. The effect 
of the initial value required for implementing the backfitting algorithm is also investigated. 

This paper is organized as follows. In Section 2, our model setting and estimating equation 
in the penalized least squares method are discussed and the backfitting algorithm to obtain 
the solution is composed. Section 3 provides the asymptotic bias and variance of the penalized 
spline estimator and then its asymptotic normality is developed. Furthermore, the uniqueness 
of the solution of the backfitting algorithm is discussed. Section 4 includes numerical studies to 
validate the theory and an application to real data is reported. In Section 5, some suggestions 
that are necessary to develop the asymptotics for the general D-variate spline additive model 
are noted by comparing similar results already developed for the kernel estimator. Proofs for 
theoretical results are all given in the Appendix. 



2 Model setting 

2.1 Bivariate additive spline model 

Consider a bivariate additive regression model 

Vi = /lOil) + h{xa) (1) 

for data xn, x^) : i = 1, • • • , n}, where fj(-) is an unknown regression function and £j's 
are independent random errors with E[si] = and V[ei] = a 2 (xn,Xi2) < oo. We assume 
E[fj(Xj)] = 0(j = 1,2) to ensure identifiability of fj. Let qj(xj) be the density of Xj and 
q(x±, X2) be the joint density of (Xi, X2). We assume without loss of generality that (xji, 2^2) £ 
(0,1) x (0,1) for all i € {1, ■ ■ ■ ,n}. 

Now we consider the .B-spline model 



Sj ( Xj )= > B^{ Xj )b h 



k=-p+l 



k 



as an approximation to fj{xj) at any Xj G (0, 1) for j = 1, 2. Here, B^\x){k = —p+ 1, ■ ■ ■ , K n ) 
are pth. degree 5-spline basis functions defined recursively as 

B [0]^ = f 1, ^k-l <X<K k , 

I 0, otherwise, 

r^k+p—l l^k—l i^k+p fvfc 

where = k/K n (k = —p+1,--- ,K n + p) are knots, if n = 0(n 7 ) with < 7 < 1/2, and 
&j,fc(.7 = 1) 2, k = — p + 1, • • • , K n ) are unknown parameters. We denote B^\x) as B^{x) in what 
follows since only the pth. degree is treated. The details and many properties of the B-splinc 
function are clarified in de Boor (2001). We aim to obtain an estimator of fj via the i?-splinc 
additive regression model 

Vi = Sl(Xil) + S 2 {Xi2) + Si, (2) 
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instead of the model ([I]). The model (J2]) can be expressed as 

y = Xibi + X 2 b 2 + e 

by using the notations y = ( Vl ■■■ y n )', bi = (b\ ■ ■ ■ b ltKn )', b 2 = (& 2 ,-p+l • • • b 2>Kn )' ', 
Xi = (i?_ p _|_j(a;ji))jj, X 2 = (B_ p+ j(xi 2 ))ij and e = (ei ••• e n )'. We adopt the estimators 
(bi,b 2 ) of (bi,b 2 ) defined as the minimizer of 

2 

L(bi, 6 2 ) = (y - - Xaba/Cy - - X 2 5 2 ) + ^ \ jn b)Q m b v (3) 

i=i 

where A jn (j = 1,2) are smoothing parameters and Q m is the mth order difference matrix. This 
criterion is called the penalized least squares method and it has been frequently utilized in spline 
regression (Eilers and Marx (1996)). For a fixed point Xj £ (0, 1), the estimator fj(xj) of fj(xj) 
is 

K n 

fj( x j) = ^2 B k{xj)i>j,k- 

k=-p+l 

and is called the penalized spline estimator of fj{xj). The predictor of y at a fixed point 
(x\,x 2 ) £ (0, 1) x (0, 1) is defined as 

V = fi(xi) + h[x 2 )- 

Since E[fj (Xj)] = is assumed for fj, the estimator of each component fj is usually centered. 
Hence fj(xj) is rewritten as 

1 11 

fj,c{Xj) = fj(Xj) — — ^2 fj{Xij), 

1=1 

as discussed in Wang and Yang (2007). In this paper, however, we do not examine fj >c because 
our interests are in asymptotics for fj and y, and asymptotic distributions of fj(xj) and fj, c { x j) 
become equivalent. 



2.2 Backfitting algorithm 

Let b = (b[ b 2 )'. In general, b = (b 1 b 2 )' is a solution of 

^M^O. (4, 

In fact, the solution of (J2|) can be written as b\ = A^ 1 X[(y — X 2 b 2 ) and b 2 = A^ 1 X' 2 (y — X\bi) , 
where Aj = X'-Xj + \j n Q m - However, this method has one defect: the L(bi,b 2 ) is not in 
general convex as the function of b. Hence, the solution of does not necessarily become the 
minimizer of ([3]). Marx and Eilers (1998) also noted this point as a typical problem of additive 
spline regression. 
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Let b = (b 1 b' 2 Y be a minimizer of ([3]). Then it is important to investigate the difference 
between b and b asymptotically. If the difference is vanishingly small, it shows that b asymp- 
totically minimizes (|3j) . The details of this assertion are given in Section 13.21 

In this paper, our estimator of (b^b^)' is composed by using the backfitting algorithm 
obtained from the solution of ([!]). The merit and usage of the backfitting algorithm are clarified 
in Hastie and Tibshirani (1990). The £-stage backfitting estimators 6^ and b 2 ^ are defined as 

b? = A^X[(y - X 2 bt 1] ) and bf = K 2 l X' 2 {y - X 1 bf ) ), 

lere 6^ is an initic 

fj(xj) at Xj 6 (0, 1) is obtained as 



respectively, where b^ is an initial value. Then, the £-stage backfitting estimator f^\xj) of 



fj%i)= E ^i)b% = B{x s )'bf , J = 1,2, 
k=-p+l 

where B(xj) = (B- p+ i(xj) ■■■ BK n (xj))'. A mathematical property of the backfitting algorithm 
is that = (b { ™ y ,b£° y y = lime^bf ' ,bf ')' satisfies 



dL(b) 



db 



b=b 



(.) = 0. (5) 



The backfitting algorithm itself is applicable in not only bivariate but also the general D-variate 

ID 

additive model. However, b- can be explicitly expressed only for the case D = 2. By referring 
to (5.24) on page 119 of Hastie and Tibshirani (1990), of 1 can be calculated as 



3 

e-i 



b\ e) = {X' 1 X 1 )- 1 X' 1 y-{X' l X l )- 1 X' 1 Y J {SiS2y{In-S 1 )y 

3=0 

-(x' 1 x 1 )- 1 x' 1 (s 1 s 2 Y- 1 s 1 x 2 b ( 2 °\ (6) 

l-\ 

bf = A 2 1 X' 2 Y,{SiS 2 } j (In-S 1 )y + A 2 1 X 2 (S 1 S 2 ) e - 1 S 1 X 2 b { 2 0) , 

3=0 

where Sj = XjA^X'j. It is shown by Theorem 10 of Buja et al. (1989) that bf°\j = 1,2) 
converge depending on b 2 ■ Thus, the backfitting estimators 6^°°^ and b^ 00 ^ converge, but the 
vectors to which they converge are not unique, depending on the initial value. We will study the 

fj°°\x s ) = B( Xj )'bf 



asymptotic behavior of fj°°\xj) = B(xj)'b^f°\ as well as the relationship of b^°°^ and b from 



now on. 



3 Asymptotic theory 

We prepare some symbols and notations to be used hereafter. Let I n be the identity matrix of 
size n. Define a matrix = {Gk,ii)ij> where the (i, j)-component is 

Gk,ij = / Bi(x)Bj{x)q k {x)dx 
Jo 
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for k = 1,2. Define a matrix = (Efcy)y, where the (i, j)-component is 

Eic,ij= / a 2 (xi,x 2 )Bi(xk)B j (x k )q(xi,X2)dxidx2 
Jo Jo 

for k = 1,2. 

Let a vector 6^ be such that B(-)bj satisfies the best approximation to the true function 
fj. For further information on this point, see Zhou et al. (1998). 

For a matrix X n = (Xij n )ij, if max{n a \Xij n \} = Op(l)(op(l)), then it is written as X n = 

Op(n~ a ll')(op(n~ a ll')). This notation will be used for matrices with fixed sizes and sizes 
depending on n. 

In spline smoothing, the smoothing parameter Xj n is usually selected as Xj n — > oo with 
n — > oo because a spline curve often yields overfitting for large n. In the following, we assume 
that Xj n = o(ni ; C~ 1 ). Hence, we choose Xj n as Xj n — > oo and Xj n = o(nA' r ^ 1 ). 



3.1 Asymptotic distribution of the penalized spline estimator 



Let foj{xj) be /• (xj) with b 2 = 0- Then, ff\x\) and f 2 \x 2 ) with arbitrary initial value 



62°'* can be expressed as 

= /o?(^i) " B{x l ) l (X[X l )- 1 X l 1 {S l S 2 f- l S l X 2 b^ 

and 

/ffoO = /^(x 2 ) + B(x 2 )'A 2 l X 2 (S l S 2 f- 1 S l X 2 b ( i\ 

respectively. First, we investigate the influence of b^ on f!j which is summarized as 

follows. 

Proposition 1 Suppose that Xj n = o^K^ 1 ). Then for £ = 1, 2, • • • , 

l/f(zi) " /dfV)l = P (K~ 2i+1 ) and \g\x 2 ) - f$(x 2 )\ = p(i^ +1 ), as n^oo. 
In particular, as t —> 00, fj (xj) = foj°\xj), j = 1, 2. 

Proposition [T] claims that the influence of b 2 on /-^ can be ignored for large n. In other 
words, for any initial value b 2 ^ M^ n+P , we can uniquely obtain fj°°\xj) as n — >• 00. Hence, 
it suffices to consider f^fixj) instead of / {xj) to develop asymptotics under manipulations 
i — > 00 and n — > 00. Here, /d^(xi) and fQ 2 (x 2 ) can be written as 

/01V1) = foi(^) " B(x i y(X[X 1 )- 1 X[ Y,{SiS 2 } k (In - Si)y 

k=l 
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and 

t-l 

/rate) = /o ( 2 } te) +B(x 2 yA^X' 2 Y,{SiS 2 } k (In - Sjy, 

it=i 

respectively. This allows the following. 

Proposition 2 Suppose that \j n = o(niT~ 1 ). TTien /or any /ixed point (xi,x 2 ) € (0, 1) x (0, 1), 
/<Sr°te) = foifa) + OpiK 1 ) and fif(x 2 ) = fg\x2) + op^" 1 ), as n ^ oc. 

We see from (jSJ) that 

/ ( S 1 1) (x 1 ) = S(x 1 )'Ar 1 ^y and / ?(x 2 ) = B(x 2 )'Aj l X' 2 {I n - S 1 )y. 

It should be noted that ff^ (ari) has the same form as the penalized spline estimator based on 
the dataset {(y^a^i) : i = 1, • • • ,n} in the univariate regression model (D = 1). This form is 
very important because the asymptotic bias and variance of the penalized spline estimator for 
the univariate regression model have been already derived by Claeskens et al. (2009). Simi- 
larly, (^2) includes B(x 2 )' 'A^ 1 X' 2 y, which is the same as the penalized spline estimator for 
univariate regression based on {(yi, ^2) : i = 1, • • ■ , n}. 

We denote fj°°\xj) as fj{xj) = B(xj)'bj, which does not depend on the initial value b 2 °^ as 
n — > 00. The usefulness of Propositions Q] and [2] is that we can realize the asymptotic equivalence 
between the backfitting estimator fj and the (marginal) univariate penalized spline estimator. 
By using the results of Claeskens et al. (2009), we obtain Theorem 1. 

Theorem 1 Suppose that fj £ C p+1 and \j n = o^nK' 1 ). Then for any fixed point (x±,x 2 ) £ 
(0, 1) x (0, 1), as n — >■ 00, 

£[/ite)] = /ite) + bjA X i) + OP^U 1 ) + OpiXjnKnU- 1 ), 

n(xi)(l + o p (l)) P {n- 1 ) 

Opin- 1 ) V 2 (x 2 )(l + o P (l)) \ ' 

where, 

b jA^) = -^B{x)'Gj l Q m b*, Vjixj) = ^BixjYGfZjGfBixj). 
By using Theorem Q3 we have the asymptotic joint distribution of [fi(xi) f 2 (x 2 )]' . 



Ate) 




Ate) 
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Theorem 2 Suppose that there exists 5 > such that E[\Ei\ 2+s ] < oo and fj G C p+1 . Fur- 



thermore, 7 and Xj n satisfy 1/3 < 7 < 1/2 and Xj n 
(xi, x 2 ) G (0, 1) x (0, 1), as n — ^ 00, 



o((nK n 1 ) 1 / 2 ). Then for any fixed point 



V 



' fi(xx) ' 


-1/2 - 


_ h(x 2 ) _ 





h(xi) - h(xi) 

h(x2) - f2{x 2 ) 



D 



N 2 



. h 



From Theorem^ f\{x\) and f 2 (x 2 ) are asymptotically independent. Asymptotic normality 
and the independence of and f 2 {x 2 ) in kernel smoothing also hold, as shown in Wand 

(1999). Thus, the penalized spline estimator and the kernel estimator for the additive model have 
the same asymptotic property. Asymptotic normality of y can be shown as a direct consequence 
of Theorem [2j We briefly note the pointwise confidence interval for fj(xj) by exploiting the 
distribution of fj(xj) obtained from Theorem [5J Here, we treat o~ 2 (xn, X{ 2 ) as known for all 
*£{!,••• , n}, but it should be estimated in data analysis. 



Corollary 1 A 100(1 — a)% asymptotic confidence interval of fj(xj) at any fixed point Xj G 
(0, 1) is 

fj(xj) ± Za/2\fv[fj{Xj)], 

where z a / 2 is the (1 — a/2)th normal percentile. 

The confidence interval in Corollary Q] will be applied to a set of real data in Section 4, in 
which we need to prepare an estimate of V[fj(xj)]. 

3.2 Minimizer of L(b) 

Here, we discuss the difference between b = and b, the minimizer of ([3|). Although b 

is the solution of @, the problem of whether it minimizes L(b) or not is not trivial. That 
is, many solutions of (J2|) might exist because L(b) is not convex. Let the M solutions of 
be {b ,b }. Then, for any bf ] £ R K »+P, there exists m G {l,--- ,M} such that 
b = = b" 1 . However, b^ 00 -* is asymptotically not dependent on as implied in Proposition 
[TJ Therefore, the uniqueness of the penalized spline estimator obtained by the backfitting 
algorithm is asymptotically satisfied. Furthermore, Theorem 3 says that b minimizes L(b). 



Theorem 3 Let H(L) be the Hessian matrix of L(b). Then H(L) is asymptotically positive 
definite. 



We see that asymptotic properties of the penalized spline estimator for the additive model 
can be obtained not only by Theorems Q] and [21 but also by Theorem [3J 
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4 Numerical studies 



In this section, we see the behavior of the estimator and validate Theorem 2 numerically by 
simulation. In addition, we aim to obtain an asymptotic confidence interval using a real dataset. 
We utilize the cubic spline (p = 3) and the second order difference penalty (m = 2) in all of the 
following numerical studies. 



4.1 Simulation 



We choose the true functions fi(xi) = sin(27rxi), 72(^2) = 2 1 cos(7TX2) and the error is e{ ~ 
U{— 0.5, 0.5). Here, U(a,b) is a uniform distribution on an interval [a, b]. The explanatory 



variables Xij(i = 1, • • ■ , n, j - 
E[f 1 (X 1 )] = and £[/ 2 (X 2 )] 



1,2) are derived from Xij ~ £7(0,1). Then, /1 and /2 satisfy 
0, respectively. We demonstrate three simulations. 



In Simulation- 1, we compare /, 



In Simulation-2, we compare /• 



(i). 



I pen, j 



,) with the true fj(xj). 
,) with 

[x 3 ) = BixjYA^X'y, 



which is the penalized spline estimator for univariate regression based on (yi 



In Simulation-3, we compare the density of A^(0,/2 
simulated 

-1/2 



with the kernel density estimate of 



V 



/f(* 2 ) 



fl(xi) 
/ 2 (x 2 ) 



(7) 



to validate Theorem [21 where we note that the covariance matrix of [fi\x\) ./^l 2 ^)]' can be 
exactly calculated and it in fact was used in this simulation. The bandwidth of the kernel density 
estimate is selected by the method of Sheather and Jones (1991). The algorithm of Simulation-3 
is given as follows: 



Step 1 Generate x^ ~ U (0, 1) for j = 1, 2, i = 1, ■ ■ ■ , n. 

Step 2 Generate the data {(yi,xn,Xi2)\i = 1, ■ ■ ■ , n} from (P) and ~ J7(— 0.5, 0.5). 

Step 3 Calculate /. (xj) at fixed point x\ = X2 = 0.5. 

Step 4 Calculate the values of (J7J). 

Step 5 Iterate from Step 2 to Step 4, 1000 times. 

Step 6 Draw the kernel density estimate of ([7|) and compare with the density of AT 2 (0, 12). 



The results of Simulation- 1, Simulation-2 and Simulation-3 are displayed in Figure 1, Figure 
2, and Figure 3, respectively. In all simulation settings, K n = 2n 2//5 , Ai n = \2n = 2n 2 ^K n 1 ^ 2 
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Figure 1: The curves of /j (xj) and fj(xj). The left panel is for (solid line) and f% (dashed 
line). The right panel is for f^ (solid) and fa (dashed). 

and I = 10 were adopted. We set the sample size n = 1000 for Simulation-1 and Simulation-2, 
and n = 100 and n = 1000 for Simulation-3. 

We see from Figure 1 that the backfitting estimator fj 10 ^ approximates fj well. We also 
observe in Figure 2 that the differences between /j 10 ^ and f pe n,j( x j) are small, which means that 
fpen.j ~ fj dominates the backfitting estimator as claimed in Proposition [2j 

The contour plots of the density estimate of (J7]) and of the density of AT 2 (0, ^2) are drawn 
in Figure 3. We observe that there is still a gap between the density estimate and the density 
of ^2(0,^2) in re = 100. However, we see from the case re = 1000 that the density estimate is 
clearly approaching the density of iV2(0, fa), as claimed in Theorem [2j 

4.2 Application to real data 

We construct the asymptotic pointwise confidence interval of fj (xj) by using real data. We utilize 
ozone data with n = 111 (Hastie et al. (2001)). We use model ([1]), where y is ozone concentration 
(ppb), x\ is daily maximum temperature (°C) and X2 is wind speed (mph). Each yi is centered 
and Xij's are modified as x^/ maxi<j< n Xy. We composed the backfitting estimator fj {xj) and 
asymptotic pointwise confidence interval of fj(xj) under the assumption that a 2 (xi,X2) = cr 2 , 
which can be estimated by 



Again, we used K n = 2n 2 / 5 , Ai„ = A 2n = 2n 2 / 5 K^ 1//2 and I = 10. 

Hastie and Tibshirani (1990) estimated fj(j = 1,2) by using a pseudo additive method 
based on a smoothing spline. In addition, they constructed a pointwise error bar defined as 
fj ± 2 X standard error, which is drawn in Figure 9.9 of Hastie and Tibshirani (1990). The 
asymptotic pointwise confidence interval exhibited in Figure 4 looks quite similar to the error 
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00 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.3 1.0 

xt x2 

Figure 2: The curve of ff\xj) and f pe n,j(xj). The lines f\ (a?i) (solid) and f pen ,i(xi) (dashed) 
are drawn in the left panel. The lines (^2) (solid) and the f pen ,2(x2) (dashed) are drawn in 
the right panel. 



Density estimate n=100 Density estimate n=1000 




-4 -2 2 4 -4 -2 2 4 

xl xl 



Figure 3: The density estimate of ([7]) (solid line) and the density of iV(0, 12) (dashed line). The 
left panel is n = 100, and the right panel is n = 1000. The contour lines of iV(0, 12) are 0.02, 
0.04, 0.06, 0.08 and 0.1. 
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0.0 0.2 0.4 0.6 0.6 1.0 

radiation 



0.2 0.4 0.6 0.8 

wind 



1.0 



Figure 4: The asymptotic pointwise confidence intervals for fj(xj): the left panel is (yi,xn) and 
the right is (yi,Xi2)- The solid lines are 95% confidence intervals and the dashed line is ff\xj) 
in both panels. 



bar. However, we see that, the asymptotic intervals given in Figure 4 are both smoother than 
the error bars. Although this is only an application to one dataset, we thus confirm that the 
confidence intervals based on asymptotic normality can be applied to real data. 

5 Discussion 



In this paper, asymptotic behavior of the penalized spline estimator in the bivariate additive 
model is investigated. The research in this paper can be seen as a spline version of the work by 
Ruppert and Opsomer (1997) and Wand (2000). To consider a generalization of the work in this 
paper to the Z)-variate additive model, it might be worthwhile to review the work by Opsomer 
(2000), including local polynomial fitting in the D-variate additive model, as introduced in 
Section 1. Let f d = (fd(x\d) ■ ■ ■ fd{x n d))'{d = 1, • • • , D). Then a formal estimating equation 
yields the estimator f d of f d as 



h ' 




In 


Si ■ 


■ Si " 


-1 


" Si " 


h 




s 2 


In • 


• s 2 




s 2 


h . 




_ s D 


s D ■ 


• In _ 







provided that M exists, where S d (d 



y = M~ x Cy 



(8) 



.., D) are kernel smoothers, as discussed in Opsomer 



(2000). In practice, the estimator is composed by the backfitting algorithm 



n 



At) 

J D 



J D h 



(9) 



s D (y-n 



{t) 



- /g.,1 
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instead of (JSj) for reformation of the computational efficiency of M _1 . If M _1 exists, it is known 
that the f d in ([9]) converges to the unique f d in ([8]) as I — > oo. Opsomer (2000) assumes the 
sufficient condition for the existence of M , by which the asymptotic bias and variance of the 
backfitting estimator for the Z)-variate additive model can be obtained. It is shown by Ruppert 
and Opsomer (1997) that M _1 certainly exists for the case D = 2. Thus, we see that even 
in kernel smoothing, such a generalization from bivariate to D-variate in the additive model 
includes the mathematical difficulty. 

On the other hand, in the spline method for D > 2, the smoother is S d = X^A^X^ and 
f d = S d b d , where b d is an unknown parameter vector. The corresponding matrix M does not 
have the inverse, even for D = 2, as detailed in Marx and Eilers (1998). Thus, the estimator of 
f d cannot be written in the form of (J8|) and so it might not be reasonable to assume the existence 
of M _1 as the kernel method did. The reason why we could proceed with asymptotics for fj°°^ 
is that the explicit form of the backfitting estimator can be obtained, which seems to be 
impossible for the case D > 3. Currently, the only result in this paper that can be generalized 
to D > 3 is Theorem 3. 

Although it is beyond the scope of this paper, it might be possible to discuss the asymp- 
totics for the penalized spline in the generalized additive model (GAM) in a similar manner. 
Kauermann et al. (2009) studied asymptotic properties of spline regression in the univariate 
generalized linear model. Therefore, the asymptotic theory of the penalized spline in the GAM 
may be considered for further research. 



Appendix 

For the proofs of Propositions 1-2 and Theorems 1-3, we define Gj n = n~ l X'-Xj{j = 1,2), 
G\2n = n X[X2, G2111 = G' 12n , and Aj n = n~ l Aj(j = 1,2). We need additional lemmas as 
follows. 



Lemma 1 G jn , Gun and A jn satisfy G jn = Op(i^ n 1 ll'), Gi 2n = Op(K n 2 U 1 ) and A-^ 
P {K n lV). 



A - 1 = Op (K n 11') have been already given in Claeskens et al. (2009). Hence we are going to 



Proof of LemmaU\ For j = 1,2, proofs for G jn = Gj + op^-^l'), Gj = OiK' 1 !!') and 

J " 

show G 12n = P (K- 2 11'). 

Let G\2n = (9ij,n)ij- The (k, /i)-component of Gi2 n is 

1 n 

9kh,n = — > B- p+ k{Xii)B_ p+ h(Xi2). 

n i 

Then g^h n can be asymptotically expressed as 



9kh,n = E[B- p+k (Xi)B_ p+h (X2)] + Op I \j -V[B_ p+k (Xi)B_ p+h (X 2 ) 



n 



13 



The E[B. p+k (X 1 )B^ p+h (X 2 )\ is bounded by 

min {q(u,v)} / B- p+k (x)dx / B- p+h (y)dy 
u,ve(o,i) Jo Jo 

^ B -p+k{x)B- p+ h{y)q{x,y)dxdy 
Jo Jo 

< max {q(u,v)} / B- p+k (x)dx / B- p+h (y)dy. 
u,ve(o,i) Jo Jo 

Hence we get E[B_ p+ j.(Xi)B^ p+ f l (X2)] = 0{K~ 2 ) because /J B^ p+k (x)dx = K~ l , see de Boor 
(2001). Similarly, 

EliB^+kiX^B^Xi)} 2 ] = f 1 [ 1 {B_ p+k (x)} 2 {B_ p+h (y)} 2 q(x, y )dxdy = 0(K- 2 ). 

Jo Jo 



Hence we have 



^V[B^ p+k (X 1 )B^ p+h (X 2 )} = 0(K- 2 n- x ) 



Since n ' = o(K n ), we have g k h,n = Op{K n ). □ 

The Gj n (j = 1,2) and G\ 2n are band matrices: for the (i, fc)-component of Gj n and G\ 2n , if 
\i — k\ < p, it is positive and it is if \i — k\ > p. 

Lemma 2 Let A = (aij)ij and B = (bij)ij be K n x K n matrices. Assume that K n — > oo as 
n — > oo, A = Op(n a ll') and B has bij = if \i — j\ > p and by = Op(n^) if \i — j\ < p, where 
a,(3£R. Then AB = P (n a+(3 11'). 

Proof of Lemma\M By structural assumption of B, the (i, j)-component of AB is 

^2a ik b kj = ^2 a ik b kj = Op{n a+(3 ). □ 

fc=l j~P<k<j-{-p 



Lemma 3 Let A = (aij)ij and B = (bij)ij be K n x K n matrices. Assume that K n — > oo 
as n —> oo, A = Op(n a ll') and there exist constants C > 0, [i £ (0,1) such that \bij\ < 
n P (7^1^1(1 + 0p (i)) f r 1 <i,j < K n . Then AB = P (n a+f3 ll'). 

Proof of Lemma\^ The (i, j)-component of AB can be evaluated as 



UB)i 



Kn 



/ J O-ikbkj 
k=l 



Kn 



< max {\a ik \}^2\b kjl . 

_L - K 1\ 7-1 



k=l 
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Since we have 



5>jy| < ^C7(l + op(l))^ / ul fc ^l 
fc=l fe=i 

3 K n 

n^C{l + o P {l))\Y,^- k + E ^ 

v jfe=i fc=j+i 

/ oo oo \ 

< ^c(i+ OP (i)) j> fc +j> fc 



"<7(l + 0p (l)) 



v fc=0 fc=0 

2 



it follows that 

2 

l<fc<jRT„ L ' " 1 — // 

and hence AB = P (n a+/3 ll / ). □ 



KABJyl < , max {|a ifc |}n /3 C T ^(l + o P (l)) 



Lemma 4 Suppose that Xj n = o(nK n ). Then for j = 1,2, i/iere exist constants Cj > and 
fij € (0,1) suc/i that |(A^)jfc| < K n CjfJj fc '(l + op(l)), where (Aj 1 )^ is i/ie (i,k)- component 

Proof of Lemma\J^ Let Aj. = Gj + \j n n~ x Q m . The Aj n can be written as 

a • r -x- ^ jn n 



n 



n 



Gj ~t~ Gj n Gj 

n 



by Lemma [TJ Hence we have 

Aj' = { A r + (G jn -G J )y 1 

= Ar 1 (j^ n+p + (G in -G i )A i .)- 1 . 

Let A n = —(Gj n —Gj)A~ 1 . By Lemma[6l the maximum eigenvalue of Gj n — Gj becomes op(K~ x ) 

and hence ||(Gj n — C^- ) 1 1 2 < op(K~ l ). Here, for n x n matrix A, \\A\\2 = sup{||Acc||/||a;||}, 

x^O 

where = y/x'x for x G W 1 . Further WAj 1 ]^ < 0(K n ) also can be obtained by the proof 
of Lemma [3] and Lemma Al in Claeskens et al. (2009). Hence there exists Nq such that for 
any n > Ao, ||An||2 < ||(Gj n — Gj)\ (2) jA" 1 ] (2 < 1. From well-known result of matrix theory, for 
n > No, {Ik u + p — A n )~ l exists and equality 

00 / 00 \ 

(I Kn+p - A^- 1 =^A k n = I Kn+p + A n I J2 A n) 
k=0 \k=0 / 
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holds. So we have 

oo 

X>£ = o P (ii'). 

fc=0 

Lemmas [H [2] and [3] yield 

^(E^nl = -(^n-^ATMf;^) 

\fc=0 / \fc=0 / 

= -(G^-G^A^OpCllO 
= -(G^-G.OOp^ll') 
= op(ll') 

for n > JVo- Therefore the Aj n can be asymptotically expressed as 

AT, 1 = A" 1 (l Kn+p + (G jn - G^Aj 1 )' 1 = Aj}{I Kn+P + op(ll')} 

and its (i, A;)-component of Aj^ becomes 

(Ajl) ik = (Aj}) ik + o P (K n ) 

because A~ 1 op(ll') = op(K n \V) by Lemma [3j From Claeskens et al. (2009), there exist 
constants Cj > and fij € (0, 1) such that [(A" 1 )^! < K n Cj^ l ~ k \. Hence we finally have 

KAr^l < KAr^l + op^) 

< K n Cj{j)j~ k ^ + o P (K n ) 
= K n C 3 ^- k \{l + o P {l)}.n 



Lemma 5 For I > 1, there exists a matrix R n = Op(K n 11') such that {S1S2Y = 
n * X\ R n ^ X 2 . 

Proof of Lemma\^ We use the inductive method. First we have the expression 
S\S 2 = -^XiA^X'^A^X'z = -XiA^Gi 2 nA 2 ^X 2 . 

Let R { n ] = A^G 12n A 2 ^. Then by Lemmas [QUE] and [U we have R n 1] = O p (11'). Next we 
assume that the (S1S2Y can be expressed as 

(S 1 S 2 ) e = -X,R^X 2 , 
n 

where R$ = Op(^ 2(£_1) ll / ). For I + 1, 

(s 1( s 2 ) m = -x x rMx' 2 s x s 2 

n 

-5X1 R%> X' 2 Xi A Gi2n A^ 1 AT 2 



n 2 



— X\ R$ G2 In A ln X G\ 2n A 2 n X' 2 . 
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So we shall put R { n +1) = R [ n ] G 21n K^G l2n K^. ^om LemmaEl we get R { n'G 21n = P {K~ n \V). 
Furthermore by using 

Lemmas E] and SI (-R^G^A^ 1 = O p (K- 2£+1 11') can be obtained. By 
the repeat use of Lemma [2] and Lemma [3] in the same manner, we have 

= P (K-^ll')G l2n A^ 
= P {K-^11'). □ 



Lemma 6 The maximum eigenvalues of K n (Gj n —Gj) and K n G\ 2n ore asymptotically vanished. 



Proof of Lemma\Bi Let A n = K n {G\ n — G\) = (aij tn )ij. Then if \i — j\ < p, aij^ n = op(l) and if 
V ~ j\ > Pi a ij,n = by Lemma[TJ Let A max (^4 n ) be the maximum eigenvalue of A n . Then there 
exists x = (xi • • • XK n + P )' 6 R Kn+p such that 

A n x — A max (A n )cc. 

The x is eigenvector of A n belonging to A max (j4 n ). Let \x m \ be max{|xi|, ••• , |xft- n+ p|}, we 
have 



I A max (^4n)x r 

The |A max (j4 n )| can be calculated as 

I A max (^4 n ) I 



K n +p 

E ■ 

3=1 



< 



K n +p 



K n +p 

E- 

i=i 



*"mj,n | 



< 



1 J/fr 

J = l 

I a mj,n I 

i=i 

— ^ ^ I Q"mj,n | 
j=m-p 

= o P (l) 

from the structure of A n . The K n G\ 2n is also band matrix satisfying K n G\ 2n = op(ll'). So 
we can prove that the maximum eigenvalue of K n G\ 2n is op(l) by the same manner. □ 

We are now in the position to give proofs of all results in Section 3. 

Proof of Proposition^ First we prove \ ff\x\) — /o?( x i)l = Op{K~ 2f ). We have 

fPfa) - /d?(*i) = -B{x 1 )\x' 1 x 1 )- 1 x' 1 {s 1 s 2 y- l s 1 x 2 bf 
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and there exists Rn ^ = Op(K n 2 ^ 2 ^ll') such that 

S(x 1 ) , (X[X 1 )- 1 X[(5 1< S 2 )^ 1 5iX 2 &(° ) = -B(x 1 yRi e - 1 ^x' 2 s 1 x 2 bf ) 



B( Xl y 'R^- 1) G 21n K^G l2n b, 



(0) 



by Lemma [5j We see from the proof of Lemma [5] that the Rn consists of the product of A lr ?', 
Gun, A 2 ^ and G 2 \ n because 

{S x S 2 f = {SiS 2 )(SiS2)'"(S 1 S 2 ) 

= X^A^XiX^X^SiS^ ■ ■ ■ {S^X^X'^A^X^ 

= — Xi{A 1 ^Gi 2n A 2 ^G 2 i n ■ ■ ■ G 2 i n A 1 ^Gi 2n A 2 ^}X 2 . (10) 

Theorefore by Lemmas [2 [3l we have 

R { t l) G2inA^G l2n b { 2 Q) = Ri e -^G 2ln Aj^O P (K- 2 l) 

= B^-^G 2 x n O P {K-H) 
= P (K-^1), 

where Op(n a l) is the vector version of Op{n a W'). Because the p + 1 components of B{x\) are 
not and others are like the column of band matrix by property of U-spline basis, we have 

S(x 1 )' J Ri £ - 1 )G 21n AT n 1 G 12n 4 0) = P {K~ 2t+1 ) 

though the size of B{x\) increases with n. Similarly, we see that 

£\x 2 ) - fS(x 2 ) = B{x2)'A 2 1 X' 2 {S 1 S 2 f- 1 S 1 X 2 bf 

becomes 

ff\x2)-flS{x 2 ) = B{x2)'A^G 21n R^G 2ln A^G 12n b^ 
= P {K n K- 2 K- 2 ^K~ 2 K n K- 1 ) 
= o P (K- 2 ^), 

which completes the proof. □ 

Proof of Proposition [H By Lemma [5j we have 

= BixtYiX'^-'Xi I I n - £{SiS 2 } fc (I n -sAy 

{ k=0 J 

£-1 

= /£>(*!) - B( Xl y(x[x 1 )- 1 x[ Y,{SiS 2 } k (i n - 

k=l 

= /g)(x x ) - B( Xl )' {^RiA (X' 2 - G 21n A^X[)ly 



.fc=i 
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and 



f!S(x 2 ) = B{x 2 )'K^X' 2 Y J {SiS 2 } k {I n -S 1 )y 



k=0 



fg\x 2 ) + B(x 2 yA^X' 2 J2{SiS 2 } k (I n - S 1 )y 

k=l 

fg\x 2 ) + B(x 2 )'A^G 21n l^2R ( A (X> 2 - G 21n A^X[)±-y. 



.k=l 



We shall focus on the sum Y^k=\ R$ • We P u ^ 



R n = lim^i?W=^4 fc ) = (r i)J> ) 



k=l 



k=l 



Then, since the backfitting algorithm converges for any n, \rij^ n \ is bounded for any (i,j) and 
n. And hence r n = maxjj |rjj i7l | is also bounded for any n, which implies 

R n = Op(ll). 

Let / = (/i(xn) + f 2 {x\ 2 ) ■ ■ ■ f\{x n i) + f 2 {x n2 ))' . Then the absolute value of /i-component 
of •nr 1 X' j f is 

^ n 1 n 

-yZ B -p+h( x ij)(.h( x n) + h(x i2 )) < max {|/i (u) + f 2 {v)\}- Y]B_ p+h (xij) = P (K~ 



because max u t)g ( ,i){|/i( n ) + /2( w )|} < oo- Hence n -Xj/ = Op(K n x l) can be obtained. From 
(|10p and the repeat use of Lemma [2] and Lemma O we have 



J R n n- 1 ^/ = Op( J ST- 1 l). 

And direct calculation gives 

W^ixi)- fSPte)]] = \B(x 1 )'R n (X 2 -G 21n A^X' 1 )^f\ 

= BixJOpiK- 1 !) + B( Xl )'0 P (K~ 2 l) 
= OpiK- 1 !) 

because the p + 1 components of B(x\) are not and others are 0. Here, Op{K~ l l) is the 
vector version of Op{K~ l ll'). Similarly, we have 



\E[f^\x 2 )-f^(x 2 )]\ 



B(x 2 )' ' A 2 ^G 2ln R n {X' 2 - G 21n A^X[)^f 

P {K n K- 2 lK, 
o P (K- 1 ). 



n ) 
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Next, we consider V[f^°\xi) — IqWxi)]. Let £ = diag[<7 2 (xn, xi 2 ) ••• o~ 2 {x n \, x n2 )]. Then 
since n^XfiXj = + o P {K- l lV) = P {K- l ll'){j = 1, 2), we have 

= R n B{ Xl )'R n (X' 2 - G 21n A- 1 X[)isi(X 2 - X 1 h^G 12n )B! n B{xx) 
= -B{xi)'RrJl2R' n B{xi) 



2?-B(x i yR n ^X' 2 XX 1 A^G 1 2nR' n B(x 1 ) 

+ -B(x 1 ) , R n G21nAin^lKnGl2nR' n B(x 1 ) 



n 

= -S(xi)XS 2 < J B(x 1 )(l + o P (l)) 
n 

= OpdnKn)- 1 ). 
Similarly, ^[/o^ (^2) — /q 2 '( x 2)] can be calculated as 

= — -B(x 2 ) / A 2n 1 G 2 i n i? n (X 2 — G 2 i ra A lri 1 X 1 ) 
xS(X 2 - X 1 Aj n 1 Gi 2n )i?;Gi 2n A 2 - ri 1 S(x 2 ) 
= is(x 2 ) , A 2n 1 G 21n i? n S 2 <Gi 2n A 2n 1 S(x 2 )(l + o P (l)) 

= o P ((nK n )- 1 ). 



Therefore, since 



we have 



and 



\if^) - f§\*i)\ = wfrhxi) - f®(*i)] + o P (^[4 co) (^)-4 1) (^)]) > 

\ft\*i)-foi(xi)\=Op(K- 1 ) 

\ft\x2)-fi ) l\x2)\=Op(K- 1 ). □ 



Proof of Theorem [1\ We see from Theorem 2 (a) of Claeskens et al. (2009) that 

£[/o ( ?(zi)] = /ifci) + M^i) + Op(i^-^ +1 )) + opiXmKnn- 1 ), 
^[/ovVi)] = ^(xO + op^n- 1 ). 

So we have 

= ^(xOJ + OpK 1 ) 

= + 6i,a(zi) + OpiK- 1 ) + op(Ai nJ PC n n- 1 ), 
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and 



vtffrHxi)] = v[f${x l )] + 2Cov(f${x 1 ie l )+v[e l \ 

= Viix^ + OpiKnU- 1 ), 

where 6± = f^°\xi) — /qJ (x\). This is in fact 

and V[Oi] = op(-fC n n _1 ) from the proof of Proposition [2j Furthermore, we also obtain 

E[ft\ x 2)] = f 2 (x 2 ) + 6 2>A (x 2 ) + P (K- 1 ) + op(X 2n K n n^), 
V[fln\x2)] = V 2 (x 2 ) + o P (K n n- 1 ). 

Finally, we calculate 

2 

CovifkfixJJ^fa)) = Covif^ix^J^ix^ + Y.Cov^fix^^^+CoviO^e^, 

where 9 2 = fl^\x 2 ) — f^{x 2 ). Then we see that 

Cov(f^\ Xl ),f^(x 2 )) = ^S(x 1 )'Ar n 1 X 1 SX 2 A 2n 1 B(x 2 ) 

+^B(x 1 yA^X[^X 1 A^G 12n A^B(x 2 ) 
= 0{n- l K n K- 2 K n ){l + o P (l)} 
= Opin- 1 ) 

because the absolute value of (k, /i)-component of n X' 1 T,X 2 is 

^ n \ n 

- S ^ j o- 2 {x il ,Xi 2 )B_ p+k (xii)B_ p+h {xi 2 ) < max {a 2 (u, v)}- V] B_ p+k {xn)B_ p+h {x i2 ) 

i=\ i=i 

= P (K~ 2 ). 

In addition, for j = 1, 2, we find 

Cov(f${ Xj )^- 3 )\ < v[f^(x 3 )) l i 2 v\e^) l i 2 

= op(n~ 1 ) 

and \Cov(8i,9 2 )\ < {V[9i]V[6 2 \} 1 / 2 = op(n~ 1 ) from the proof of Proposition [2l This completes 
the proof. □ 



Proof of Theorem [H If we prove 

-1/2 



V 



h(xt) 

h(x 2 ) 



f2(x 2 ) 



E[h{xi)] 
E[f 2 {x 2 )] 



D 



N 2 








(11) 
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{0}, we 



then we have Theorem [2j We rewrite fj(xj) as Y^i=i w i,jnyi- For any (a\ a 2 )' G M 2 
check 

n 

S n = aifi(xi) + a 2 f 2 (x 2 ) = y^(aiWj,in + a 2 w it 2n)yi 

i=l 

satisfies the required Lyapunov condition. First, we obtain 

V[S n ] = alV[h{xi))+2a l a 2 Cov{h(x l )J 2 {x 2 ))+alV[h{x2)] = CK^n" 1 ) 
by Theorem [TJ Next we note that the leading term of Wij n 

Bix^'K-^Bix^il + o P (l)} i = 1, 2, 
because it is the ith component of B(xj)'Aj^n~ 1 X', with Xj = (Bk(xij))ik- Hence we have 



(12) 



w 



Op(K n n 1 ) and 



E[\{aiw i>ln + a 2 w it 2n}yi - {aiw i)ln + a 2 w i>2n }E%]\ 2+s ] 
= \aiw i)ln + a 2 Wi, 2n \ 2+6 E[\ei\ 2+S ] 
'K 2+5 ' 



Op 



n 



2+8 



(13) 



So it follows from (pj, dTHJ) and K„ = 0(n?) that 
1 



y [5n] (2 +5 )/2 



Op (K^ 2+s y 2 n-^l 2 )0{n)0 P 



K 



2+8 



n 



2+8 



P (n 



Ml+8/2)-8/2 



Therefore, for 5 > 27/(1 — 7), 



y^EHjatWi^n + a 2 Wi, 2n }yi - {aiw i)ln + a 2 w i)2n }E[Yi\\ 2+s ] = o P (l) 



V[ Sn ](2+8)/2 

By Lyapunov theorem and Cramer- Wold Device, we get ([lip . 

Consequently, since asymptotic bias of fj{xj) is bj : \(xj) = Op(Xj n K n n^ 1 ) = op(Kn 2 n~ l l 2 ), 
Theorem [2] has been obtained. □ 



Proof of Theorem 3: We show that the H{L) becomes positive definite as n — > 00. Now, H(L) 
is divided into 

X[X± X[X 2 
X^Xx X' 2 X 2 
Hx + H 2 . 



H(L) 



+ 



MnQm O 
O X 2n Qr 



Then it is known that Q m has eigenvalue 0, hence the H 2 is nonnegative definite. We show that 
the Hi becomes positive definite as n — > 00. For any z G W 2 ( Kn+p,) with z'z = 1, 



K 



n 



z'Hiz 



Knd O 

O K n G 2 
= z'Huz + z'H 12 z. 



z + z 



K n {Gln — G\) K n G\ 2n 

E n G 2 \ n K n (G 2n — G 2 ) 
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By (6.10) of Agarwal and Studden (1980), we can find z'Huz > 0. Now we show z'Kyiz = 
op(l). We write z = (z[ z' 2 )' , where zj G R Kn+p . Then since z'z = z' x Z\ + z' 2 z 2 = 1, 
z'jZj < l(j = 1,2). So we get 

z'Hi 2 Z = z\K n {G\ n — G\)Z\ + 2z l K n G\2nZ2 + z' 2 K n (G2n — G2)z 2 . 
The maximum eigenvalue of K n (Gj n — Gj) is o p (l) from LemmaEJ we have 
z'K n (G, n - Gj)zi = z > z . Z 3 K ^ G ^~ G i) z o = 0p(1); ■ = ^ 2 



Similarly, we get 



I / n I ^ Hi / / z 2 K nGl2nZ2 
\Z 1 K n Lri2nZ2\ S \ z l z l\ Z 2 Z 2 ', 



Z 'l Z l\l z 2 z 20p(l) 

= o P (l) 

because the maximum eigenvalue of K n G\ 2 n is a l so °p(l) as shown in Lemma [H Above evalua- 
tions are combined into z' H12Z = op(l). Consequently, we obtain 

71 

z'H x z = — (z 1 'H u z + z' H 12 z) 



n 

K, 
n 

K n 
> 0. □ 



1 'IT ft , ZH ^ Z \ 

-z H n z 1 + -— 

n V zHllZj 

z'H n z(l + o P (l)) 



References 

[1] Agawal,G. and Studden, W. (1980). Asymptotic integrated mean square error using least 
squares and bias minimizing splines. Ann. Statist. 8,1307-1325. 

[2] Buja,A.Hastie,T. and Tibshirani,R.(1989). Linear smoothers and additive models (with 
discussion). Ann. Statist. 17,453-555. 

[3] Claeskens,G., Krivobokova,T. and Opsomer,J.D.(2009). Asymptotic properties of penalized 
spline estimators. Biometrika. 96, 529-544. 

[4] de Boor, C. (2001). A Practical Guide to Splines. Springer- Verlag. 

[5] Eilers,P.H.C. and Marx, B.D. (1996). Flexible smoothing with l?-splines and penalties (with 
Discussion). Statist. Sci. 11, 89-121. 

[6] Green, P.J. and Silverman, B.W. (1994). Nonparametric Regression and Generalized Linear 
Models: A Roughness Penalty Approach. Monographs on Statistics and Applied Probability 
58. London: Chapman & Hall. 



23 



[7] Hall, P. and Opsomer,J.D.(2005). Theory for penalized spline regression. Biometrika. 
92,105-118. 

[8] Hastie,T. and Tibshirani,R.(1990). Generalized Additive Models. London Chapman & Hall. 

[9] Hastie,T., Tibshirani,R. and Friedman, J. (2001). The Elements of Statistical Learning, 
Springer- Verlag. 

[10] Kauermann,G., Krivobokova,T., and Fahrmeir,L.(2009). Some asymptotic results on gen- 
eralized penalized spline smoothing. J. R. Statist. Soc. B 71, 487-503. 

[11] Marx,B,D. and Eilers,P.H.C.(1998). Direct generalized additive modeling with penalized 
liklihood. Comp. Statist & Data Anal. 28, 193-209. 

[12] Opsomer,J.D.(2000). Asymptotic properties of backfitting estimators. J. Mult. Anal. 73, 
166-79. 

[13] Opsomer,J.D. and Ruppert,D.(1997). Fitting a bivariate additive model by local polynomial 
regression. Ann. Statist. 25, 186-211. 

[14] 0'Sullivan,F.(1986). A statistical perspective on ill-posed inverse problems. Statist. Sci. 1, 
505-27. (with discussion). 

[15] Ruppert,D., Wand, M. P. and Carroll,R.J.(2003). Semiparametric Regression, Cambridge 
University Press. 

[16] Sheather, S. J. and Jones, M. C.(1991). A reliable data-based bandwidth selection method 
for kernel density estimation. J. R. Statist. Soc. 53, 683-690. 

[17] Stone, C.J. (1985). Additive regression and other nonparametric models. Ann. Statist. 13, 
689-705. 

[18] Wand,M.P.(1999). A central limit theorem for local polynomial backfitting estimators. J. 
Mult. Anal. 70, 57-65. 

[19] Wang,L. and Yang,L.(2007). Spline-backfitted kernel smoothing of nonlinear additive au- 
toregression model. Ann. Statist. 35, 2474-2503. 

[20] Wang,X., Shen,J. and Ruppert,D.(2011). On the asymptotics of penalized spline smoothing. 
Ele. J. Statist. 5, 1-17. 

[21] Wahba,G.(1975). Smoothing noisy data with spline functions. Numer.Math. 24,383-93. 

[22] Zhou,S., Shen,X. and Wolfe, D. A. (1998). Local asymptotics for regression splines and con- 
fidence regions. Ann. Statist. 26(5):1760-1782. 



24 



